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A new method for estimating the present failure rate of a program 
is presented. A crude nonparametric estimate of the failure rate function 
is obtained from past failure times. This estimate is then smoothed by 
fitting a completely monotonic function, which is the solution of a 
quadratic programming problem. The value of the smoothed function at 
present time is used as the estimate of present failure rate. A Monte 
Carlo study gives an indication of how well this method works. 


Research Supported 
by 

National Aeronautics and Space Administration 
Grant NAG-1-179 




THE GEORGE WASHINGTON UNIVERSITY 
School of Engineering and Applied Science 
Washington, DC 20052 

Institute for Management Science and Engineering 


COMPLETELY MONOTONE REGRESSION ESTIMATES 
OF SOFTWARE FAILURE RATES 

by 

Douglas R. Miller 
Department of Operations Research 
The George Washington University 
Washington, DC 20052 

and 

Ariela Sofer 

Department of Systems Engineering 
George Mason University 
Fairfax, Virginia 22030 


GWU/ IMSE/ Serial T-497/85 
January 18, 1985 


Introduction 

Suppose a program contains some bugs each of which eventually 
manifests itself as a failure of the program to execute correctly. If 
each bug is removed when it causes a failure, failures should occur at 
a decreasing rate and the program exhibit reliability growth. Suppose 
successive failures occur at times 

,0 < t. < t < . . • < t (1) 

1 2 o n 

and the number of failures observed in [0»t] is denoted as n(t) , 0 < t ; 
Figure 1 shows an example of such data, (n(t) , 0 < t < t^} • Important 
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Figure 1 


Observed number of failures as a 
function of time; n(t) = number of 
failures in [0, t] . 
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problems in the area of software reliability focus on data as in (1) or 
Figure 1: we would like to make various statistical inferences from the 

data. Assuming a time-homogeneous usage environment, what can be said 
about future failures? How many failures are expected over some finite 
future horizon? What is the distribution of time until the next failure? 
What is the present failure rate of the program? (If bugs causing future 
failures are not removed at those failure times, the present failure rate 
should remain constant into the future.) Many papers in the software 
reliability literature address these and related issues. The purpose of 
this paper is to address the problem of estimating the present failure 
rate of a program. Our method is new; it consists of smoothing a raw 
estimate of the failure rate with a completely monotone function. 

Reliability Growth Models and Complete Monotonicity 

One approach to the inference problems arising from the observed 
failure times in (1) is postulation of probability models for the failure 
times. Such models include Jelinsky-Moranda [8], Goel-Okumoto [7], 
Duane-Crow [4,3], Littlewood [9], and Musa-Okumoto [13]. These are all 
parametric models. The general approach consists of selecting a particular 
model by goodness-of-f it , past quality-of-prediction, or some other 
criterion and then, using that model (with estimated parameters), predict 
the future or give estimates of the current failure rate. It is 
interesting to note that all the above models have a common property: 
complete monotonicity of failure rate function. 

Let N(t) equal the (random) number of failures observed in [0,t] 
and let M(t) = EN(t) be the expected number. The intensity function 
of the point process (N(t), 0 < t) is m(t) = ; M(t) , 0 < t ; m( # ) 

is also sometimes referred to as the failure rate of the process. A 
function m(*) is completely monotone if and only if it possesses 
derivatives of all orders and 

n n 

(-1) >0,0<t,n>0, (2) 

, n ~ 
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see Feller [5, p. A 39 ] . It is simple to verify that all of the above 
mentioned software reliability growth models satisfy (2). Futhermore 
if the failure times are modeled as order statistics of independent but 
not necessarily identically distributed Exponential random variables, 

(2) is also satisfied; see Miller [11]. Finally the class of completely 
monotone functions is identical to the totality of intensity functions 
for the family of doubly stochastic Exponential order statistic processes; 
see [11] again. Thus, the set of completely monotone intensity functions 
seems to be a natural basis for a nonparametric approach to estimating 
the failure rate. Our approach will be to find a completely monotone 
rate function which, in some sense, best fits the failure data in (1). 
Various specific formulations of the problem are possible; we present 
two closely related formulations here. 

Problem Statement - First Formulation 

Consider failure data as in (1). A raw estimate of the failure 
rate function is 

J(t) - 1 /-r • 1 ' c i+i ■ 1-1 n • (3) 

i+l i 


(This is a rather crude and naive estimate, but it does have some nice 

properties such as being totally nonparametric and it has appeared in the 

reliability literature, e.g. [12].) The above failure rate estimate is 

shown in Figure 2 for the data from Figure 1. (A very naive estimate of 

r(t ) would be r(t ) .) Our goal is to find the "closest" completely 
n n 

* vs y 

monotone function r (t) , 0 < t < t , to r(t) , 0 < t < t ; we 

- - n - - n 

shall use the criterion of "least-squares." It appears necessary to 
formulate the problem in discrete time. We shall use the failure times 

A A - * A - 

of (1): let r = r(t ) and r i = r (O • Th e least squares distance 

for the discretized problem is 

* 11 A * 2 
D(r , r ) = l (r ; . - r i ) (t i - t i _ 1 ) 

i=l 


- 4 - 


(4) 
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Raw estimated failure rate for data 
in Figure 1 sup erimposed on true rate 
r ( t) = 1/ AoOt 
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* 

There is some problem in carrying the complete monotonicity of r (•) 

over to the sequence ir^ , i=l,...,n} because of the unequal spacing 

between discrete time points. Thus we shall temporarily consider only 
the first two derivatives, which lead to 


* * 

* r i r i-l 

Ar. « r 1 < 0 

1 t. - t - 

i i-I 


i 2, . . . ,n 


(5) 


* * 

.2 * 4r i - 4r i-l > 0 

4 r i * T -t.' - 

1 l-l 


i 3 , • • ■ , n 


( 6 ) 


We now have two optimization problems: 


I. For given (r^ , i=l,2,...n} find {r^ , i=l,2,...n} which minimizes 

~ * * * * 

D(r , r ) subject to the constraints r > 0 , r. - r. < 0 , i=2,3,...,n 

n - i i-l - 

II. For given {r^ , i=l, 2, . . . ,n) find { r , i=l,2,...,n) which 

^ * * * * 
minimizes D(r , r ) subject to the constraints > 0 , - i\_^ f 0 » 

i=2,3,...,n , and (r* - r*_ 1 )/(t i - t^) > (r*^ - r *_ 2 ^^ t i_i " t ±-2 > ’ 


i 3,4, ... ,n • 


Note that the above statements both relax the constraint of complete 
mono tonicity considerably: Problem I requires only monotonicity and 

Problem II requires only convexity and monotonicity. 

The above problems are both quadratic programs. We have encoded an 
algorithm to solve them which is described in the Appendix. The solutions 
of Problems I and II for the data of Figure 1 were computed and are 
presented in Figures 3 and 4, respectively. The data of Figure 1 is 
actually Monte Carlo simulated data from a nonhomogeneous Poisson process 
with intensity r(t) = 1/ ^400t , 0 < t , a Duane-Crow model. This "true" 

failure rate function is shown on Figures 2, 3, and 4. In this example 


- 6 - 
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Figure 4. Convex, monotone estimate of failure rate for 
data in Fi gure 1 superimposed on true rate 
r (t) = 1/ \ 1 400 1 


- 8 - 







T-497 


present time is = 179,367. The true rate is r(t^g) = 1*18 x 10 . 

-4 

The rate estimated from the monotone function (Figure 3) is 0.63 x 10 

-4 

The rate estimated from the convex function (Figure 4) is 1.38 x 10 
These results seem promising. 

Note that Problem I above is the well-known "isotone-regression" 
problem described by Barlow, Bartholomew, Bremner and Brunk [1) and 
addressed in the reliability growth context by Campbell and Ott [2] 
and Nagel, Schloz, and Skrivan [14]. If the last interfailure time 
happens to come from the right tail of the interfailure time distribution 

. „ _ * 
r = r(t ) will underestimate r(t ) and the monotone constraint on r 
n n n 

will have no effect, leading to a negative bias ; imposing the additional 
constraint of convexity does have an effect as can be seen in Figures 
2, 3, and 4. In most software reliability applications a positively 
biased estimate of failure rate is safer than a negatively biased 
estimate, thus the convexity constraint seems to be desirable and the 
generalization of isotone regression to completely monotone regression 
worth pursuing. 

Problem Statement - Second Formulation 

The above formulation only considered first and second difference 
constraints. It is more straightforward to deal with higher differences 
if we formulate the problem in terms of equally-spaced discrete time- 
points. Let [0, t^] be divided into k intervals of equal length, 


As = t /k ; and define k time points s. = iAs , i=l,2,...,k 
n l 

smoothed version of the raw failure rate r(-) is 

s 

a, f 1 

r(t) =J r(s) ds/As , s^^ t t < • 

s 

i-1 


A 


(7) 


Let 


^ ^ - 
r i ■ r<s i > 


i=l, 2, k 


We use 



9 " 


, i=l, ,k) as the data 
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point in the least-squares regression problem. The optimal solution 
* 

is denoted as {r^ , i-l,2,...,k) as before; we constrain this solution 
to satisfy 

(-I)" 1 > 0 , m-1 <i<k,0<m<d, (8) 

where d corresponds to the maximum difference considered. The least 
squares distance is 

k 

'V * ^ * 2 

D(r , r ) = I (r. - r r . (9) 

i=l 1 1 

We have a family of quadratic programming problems (for d = 1,2,3,...): 

r\, * 

Minimize D(r , r ) subject to constraints in (8). - - 

We note that there are many possible formulation of the problem of 
finding the "best" completely monotone rate function. The two we have 
given seem to be closest to the standard formulations of isotone regression 
problems and therefore are logical initial approaches to this general 
problem. We plan to consider several other possible formulations in the 
future . 

A Monte Carlo Study of Performance 

We have chosen to do a more thorough investigation of the above 
second formulation. We wish to determine how accurately the present 
program failure rate (i.e. the failure rate at the end of the observed 
data) can be estimated by using the value of a least-squares completely 
monotone (up to d differences or derivatives) regression curve at that 
point. This is a complex inference procedure which is not amenable to 
analytic evaluation, therefore we use Monte Carlo simulation to estimate 
the average relative error and standard deviation of the estimator. 

For our initial set of simulated failure data we chose the 
Musa-Okumoto [13] Logarithmic Nonhomogeneous Poisson process. This 
model seems to describe software failures well and for different 
parameter values it covers a range from no growth to extreme growth. 


- 10 - 
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The mean function M(*) and the parameter values considered are shown 
in Table I and the corresponding mean functions are graphed in Figure 
5; the values of 6 were chosen so that the ith curve goes through 

(50-5i, 20+2i) , i=0,l 9 . The amount of growth can be seen from 

the curves or by comparing M(50) and M(100) - M(50) in Table I. 

Each data point generated consists of 40 observations from one of these 
M-0 processes. 

The results of the Monte Carlo study appear in Table II. We 
considered 7 different models: through P, . For each of the first 

five we generated 1000 data points, i.e., sample paths of 40 observations 
each; for the last two we generated 400 data points. For each data point 
we discretized the time interval into 40 equal subintervals and found 
the least-squares regression lines whose first d differences satisfied 
the completely monotone property, d = 1, 2, 3, 4, 5, and 6. We then 
computed the relative error 

e d " <4 ' r(t 40»' r(t 4O> <10) 

for each of the 6 values of d . We then computed the average and 
standard deviation of the relative errors over all 1000 (or 400) 
replicates for each value of d and 6 . For example, in Table II, 
for the monotone (d=l) least-squares estimate underestimated 

the true rate by 26.7 percent on the average and the relative error 

) 

had a standard deviation of 24.0 percent. 

The numbers in Table II merit some discussion. The first column 
(d=l) corresponds to the traditional isotone regression. We see a 
significant negative bias in the cases of small and moderate reliability 
growth. (The positive bias for B c and may reflect an 

j o 

inappropriateness of the mean as a measure of location in these cases 
more than anything else.) Note also that the variability, i.e., standard 
deviation, of the estimator goes from an acceptable to an unacceptable 
level as the growth becomes more extreme. Using higher differences 
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Table 1 

Cases of Musa-Okumoto Logarithmic Poisson Model 


M(t) 


40 


log (Bt + 1) 
log (looe + i) 



I 



-4 

M(50) 

e o 

= 

.100 

X 

10 

20.00 

B 1 




-1 


= 

.124 

X 

10 

23.93 



.429 


-i 


e 2 


X 

10 

27.51 

e 3 

= 

.131 



30.56 

g 4 

= 

.461 



33.02 

*5 

= 

.243 

X 

10 

34.99 

e 6 

= 

.311 

X 

10 2 

36.55 

P 7 

= 

.311 

X 

10 4 

37.81 

e 8 

= 

.100 

X 

10 9 

38.80 

e Q 

= 

.104 

X 

10 25 

39.54 


M(100) - M(50) 
20.00 
16.07 
12.49 

9.44 
6.98 
5.01 

3.45 

2.19 

1.20 

.46 
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Table II 


Performance of completely monotone least squares estimators of 
failure rate at 40th failure of M-0 model: Average relative 

error and (in parentheses) standard deviation of relative error 


£ 

d=l 

d=2 

d=3 

d=4 

d=5 

d=6 

v 

-.267 

(.240) 

-.051 

(.199) 

-.054 

(.205) 

-.055 

(.206) 

-.055 

(.206) 

-.065 

(.203) 

*1 

-.184 

(.307) 

.093 

(.290) 

.059 

(.326) 

.055 

(.328) 

.054 

(.328) 

.061 

(.315) 

B 2 

-.126 

(.366) 

.149 

(.395) 

.082 

(.442) 

.071 

(.441) 

.069 

(.439) 

.076 

(.434) 


-.067 

(.397) 

.186 

(.462) 

.106 

(.518) 

.092 

(.517) 

.086 

(.514) 

.089 

(.509) 

*4 

-.008 

(.439) 

.227 

(.519) 

.150 

(.577) 

.133 

(.579) 

.129 

(.575) 

.131 

(.570) 

B 5 

.071 

(.476) 

.277 

(-578) 

.209 

(.633) 

.187 

(.639) 

.180 

(.640) 

.179 

(.632) 

*6 

.141 

(.531) 

.347 

(.654) 

\ g 

.258 

(.723) 

.233 

(.730) 

.222 

(.728) 

.219 

(.723) 


The average and standard deviations are estimated from 1000 independent 
replicates for 3^ , * £3 and 8 ^ and from 400 independent 

replicates for B c and 

5 o 
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(d > 1) we see an improvement in the bias and a small increase in the 
variability for small and moderate growth situations. A positive bias 
i6 conservative when estimating the failure rate and therefore is preferred 
to a negative bias. Thus from Table II it seems to follow that estimators 
based on higher differences are preferred to the isotone estimators for 
small and moderate growth. For cases of more extreme growth it simply may 
be impossible to get good nonparametric estimates. 

In order to get some feeling for the significance of the variability 
of the above estimators we consider the problem of estimating the failure 
rate in a known time-homogeneous environment, i.e. no growth. Let 

Xi , . • .X^ be i.i.d. Exponential random variables with mean 1, then 

n 

X = n/ Z X. 
i-1 1 

is an estimator of the failure rate, and because X = 1 , X - 1 is an 
estimator of the relative error. It can be shown that 


E (X) 


n 

n-1 


E(X 2 ) 


2 


n 

(n-l)(n-2) 


2 

^ n 

Var(X) = r 

(n-l) Z (n-2) 


For n=40 , we get Var(A) = .0277 and the standard deviation equals .166 . 


Comparing this to the standard deviations in Table II for we see a 

surprisingly small difference; so we do not seem to lose much precision 
by using the nonparametric approach which makes no assumptions about 
time homogeneity. 

Finally, we note that the mean and standard deviation of the 
estimators do not give a complete picture of the performance. The 
complete distribution of relative errors is more revealing. Figure 6 
shows the empirical distributions of the 1000 relative errors observed 
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© 



Relative Error 


Figure 6. Cumulative distributions of relative errors for 
estimates of failure rate at 40th failure of 
M-0 model with 6 = = *131 based on d=l 

and d=2. (The distribution for d=l is to the 
left.) 
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for model 8^ using d=l and d=2 . From these plots we are tempted 

to prefer the convex (d=2) estimator to the monotone (d=l) one even 
though the absolute average error is three times larger. One reason for 
this choice is that the monotone estimator underestimates 62.6 percent 
of the time while the convex estimator underestimates 35.9 percent of 
the time. 

Conclusions and Future Work 

We have presented a new nonparametric method for estimating the 
current failure rate of a program, i.e. the failure rate at the end of 
a sequence of observed failures. A limited Monte Carlo study shows that 
the method works well for data sets with small or moderate growth but 
not for data sets with extreme growth. 

This initial study shows that the method has potential. We are 
pursuing several paths which should improve the method and allow it to 
achieve its full potential. We are looking at formulations in terms 
of the mean function instead of the failure rate function. Constraining 
the solution to be completely monotonic into the future may improve the 
estimate of present failure rate; this extension may lead to predictions 
of future reliability growth. Finally, it may be possible to develop a 
more efficient and more stable numerical algorithm. 


17 - 
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APPENDIX 


The Optimization Algorithm 


The optimization problems of the two formulations presented are 
both linearly constrained quadratic programming problems (in the 
variables r\) . A computer program was written for solving these 
problems, using a Newton type variable reduction algorithm . (Such 
algorithms are described in detail in McCormick [10] and in Gill et al . [6]. 
The program uses a Cholesky decomposition (see Gill et al [6]) to factorize 
the projected Hessian matrix. It should be noted that the system of 
equations (8) is equivalent to the reduced system of equations 


(-l) d A d r* > 0 


d+1 < i < k 


i i * 

(-1) 1 A 1 r k > 0 


0 < i < d-1 


( 11 ) 


This reduction to a system of non-redundant equations (prior to solution) , 
is necessary, since all algorithms for constrained optimization assume 
linear independence of the constraint gradients. 

For smaller values of d , the optimization program proved quite 
efficient. As examples, the running time for 900 problems of 40 variables 
each, with 6 = 0.1 and d=3 was under 3 CPU minutes on an IBM 4341, 
while 600 problems of 40 variables with 8 = 0.00001 and values of d 
varying from 1 to 6 took just under 1 CPU minute. However, for larger 
values of the maximal order of constraint differences d , numerical 
problems arise. As the value of d increases, the constraint coefficient 
matrix becomes increasingly ill conditioned. As a result, a solution 
which satisfies the system of equations (11) up to an acceptable 
tolerance (say, 10 ^) does not necessarily satisfy system (8) to an 
acceptable tolerance, and hence is unacceptable. Another problem 
encountered for larger d is that the program fails to complete the 
Cholesky decomposition of the projected Hessian, even though this matrix 
is known to be positive definite. 


18 
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Various steps can yet be taken to improve the efficiency of the 
program. Various measures to restore feasibility have been introduced, 
and these could still be improved. In addition, an orthogonal factoriza- 
tion, rather than a Cholesky decomposition could be used. This would tend 
to increase the numerical stability at the expense of increased running 
time. 


19 
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